More Instruction Level Parallelism Explains the Actual Efficiency of Compensated Algorithms

نویسندگان

  • Philippe Langlois
  • Nicolas Louvet
چکیده

The compensated Horner algorithm and the Horner algorithm with double-double arithmetic improve the accuracy of polynomial evaluation in IEEE-754 floating point arithmetic. Both yield a polynomial evaluation as accurate as if it was computed with the classic Horner algorithm in twice the working precision. Both algorithms also share the same low-level computation of the floating point rounding errors and cost a similar number of floating point operations. We report numerical experiments to exhibit that the compensated algorithm runs at least twice as fast as the double-double one on modern processors. We propose to explain such efficiency by identifying more instruction level parallelism in the compensated implementation. Such property also applies to other compensated algorithms for summation, dot product and triangular linear system solving. More generally this paper illustrates how this kind of performance analysis may be useful to highlight the actual efficiency of numerical algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Algorithmes compensés en arithmétique flottante : précision, validation, performances. (Compensated algorithms in floating point arithmetic : accuracy, validation, performances)

Rounding error may totally corrupt the result of a oating point computation.How to improve and validate the accuracy of a oating point computation, without largecomputing time overheads ? We contribute to this question considering two examples:polynomial evaluation and linear triangular system solving. In both cases we use thecompensation of the rounding errors to improve the ac...

متن کامل

Re-evaluating Mpeg Motion Compensation Search Criteria

As processors evolve from simple scalar machines to more advanced designs using dynamic instruction scheduling, speed critical algorithms should be reexamined to ensure that the optimal strategy is being used. One important example of this class of algorithms is MPEG encoding. Although the Mean Absolute Distance approach is recognized in the literature as more eecient in terms of speed, our res...

متن کامل

User Input User Input User Input Probed Program Probed Executable Profile Information Probing Library

Instruction schedulers for superscalar and VLIW processors must expose suucient instruction-level parallelism to the hardware in order to achieve high performance. Traditional compiler instruction scheduling techniques typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase instruction-level paralleli...

متن کامل

Performance Evaluation of Core Numerical Algorithms: A Tool to Measure Instruction Level Parallelism

We measure and analyze the instruction level parallelism which conditions the running-time performance of core numerical subroutines. We propose PerPI, a programmer oriented tool to fill the gap between high level algorithm analysis and machine dependent profiling tools and which provides reproducible results.

متن کامل

Instruction Scheduling and Executable Editing 1

Modern microprocessors offer more instruction-level parallelism than most programs and compilers can currently exploit. The resulting disparity between a machine’s peak and actual performance, while frustrating for computer architects and chip manufacturers, opens the exciting possibility of low-cost or even no-cost instrumentation for measurement, simulation, or emulation. Instrumentation code...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007